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Abstract — This paper considers competitive mobility-on- 
demand systems where a group of vehicle sharing companies 
provide pickup-delivery service in populated areas. The compa- 
nies, on one hand, want to collectively regulate the traffic of the 
user queueing network, and on the other hand, aim to maximize 
their own net profit at each time instant by increasing the 
user delivery and reducing the transition of empty vehicles. We 
formulate the strategic interconnection among the companies as 
a real-time game theoretic coordination problem. We propose 
an algorithm to achieve vehicle balance and practical regulation 
of the user queueing network. We quantify the relation between 
the regulation error and the system parameters (e.g., the 
maximum variation of the user arrival rates). 

I. Introduction 

Private automobiles are not a sustainable solution to per- 
sonal mobility given their drawbacks of energy inefficiency, 
high greenhouse gas emissions and induced traffic conges- 
tion. The report [20] shows that in 2010, traffic congestion 
caused an annual cost of 101 billion, and drivers spent 4.8 
billion hours in traffic in United States. Mobility-on-demand 
(MoD) systems represent a promising paradigm for future 
urban mobility. In particular, MoD systems are one-way 
vehicle-sharing systems where vehicle-sharing companies 
provide sharing vehicles at stations in a geographic region 
of interest, and users drive or are driven by the vehicles 
from a pickup location to a drop-off location. Several pilot 
programs have empirically demonstrated that MoD systems 
are efficient in addressing the drawbacks of private automo- 
biles. In MoD system, the arrivals and departures of users 
are uncorrelated, so it is important to real-time reallocate the 
vehicles to match the dynamic and spatial demands. In this 
paper, we focus on competitive MoD systems where multiple 
service suppliers compete with one another to maximize 
their own profits. The paper [18] instead studies the scenario 
where there is a single service supplier. 

Literature review. Networked resource allocation among 
competing users has been extensively studied in the context 
of Game Theory. In [2], the authors exploit differential game 
theory to derive Nash equilibrium controllers for multiple 
self-interested users to regulate the traffic of a single queue. 
However, the approach in [2] is not applicable to our problem 
due to; e.g, the additional dynamics of vehicle queues, the 
nonlinearity and non-smoothness of dynamic systems and 
the presence of state and input constraints. Static games 

The authors are with Laboratory for Information and Decision Systems, 
Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cam- 
bridge, MA 02139, {mhzhu, f razzoli}@mit . edu. This research 
was supported in part by the Future Urban Mobility project of the Singapore- 
MIT Alliance for Research and Technology (SMART) Center, with funding 
from Singapore's National Research Foundation. 



have also been widely used to synthesize decentralized 
schemes for resource allocation, and a necessarily incomplete 
reference list includes [1], [13], [16], [21], [23]. Another 
relevant problem is demand response in the emerging smart 
grid where customers manage their electricity consumption 
in response to supply conditions. Some references on the 
regard include [8], [12]. 

Our problem is also related to (open-loop) optimization 
and games in dynamic environments. In [5], the authors 
study the problem of seeking the common global optimum 
of a sequence of time- varying functions. The papers [6], 
[7] investigate resource allocation of communication systems 
over time-varying fading channels. The online convex opti- 
mization and games have been considered in the papers [11], 
[24]. 

Another set of papers relevant to our work is concerned 
with generalized Nash games. This class of continuous 
games are first formulated in [3]. Since then, a great ef- 
fort has been dedicated to investigating the existence and 
structural properties of generalized Nash equilibria. An in- 
complete reference list includes the recent survey paper [9] 
and [4], [10], [19]. There have been several algorithms 
proposed to compute generalized Nash equilibria, including 
ODE-based methods [19], nonlinear Gauss-Seidel-type ap- 
proaches [17], iterative primal-dual Tikhonov schemes [21] 
and best-response dynamics [15]. In our recent paper [22], 
we consider distributed computation of generalized Nash 
equilibria over unreliable communication networks. 

Contributions. In this paper, we present a model of com- 
petitive MoD systems and formulate the problem of real- 
time game theoretic coordination among multiple players 
(i.e., vehicle sharing companies). In particular, each player 
wants to collectively regulate the traffic of the user queueing 
network through delivering the users to their destinations. On 
the other hand, each player aims to maximize his net profit at 
each time instant by increasing the user delivery and reducing 
the transition of empty vehicles. We propose an algorithm 
to achieve vehicle balance and practical regulation of the 
user queueing network. The closed-loop system consists of 
a feedback connection of the cyber and physical layers: in the 
cyber layer, the players seek instantaneous Nash equilibrium 
in a distributed fashion; the intermediate estimates of Nash 
equilibrium are employed to control the physical queueing 
networks after a proper projection; the states of the queueing 
networks are injected into the cyber layer to keep track of 
Nash equilibrium. We quantify the relation between the reg- 
ulation error and the system parameters (i.e., the maximum 
variation of the user arrival rates). For ease of presentation, 



the notations of Sections [In] and |IV] will be introduced and 
summarized in the Appendix. 

II. Problem formulation 

In this section, we will provide a model for competitive 
MoD systems and introduce the problem formulation con- 
sidered in the paper. Basic notations used in this section are 
summarized in Table U 

TABLE I 
Basic notations 





user anival rate at station re 


v l : l (t) 


number of vehicles of player i at station ft 




delivery rate of player i at station re 




transfer rate of empty vehicles of player i 


Q4*) 


queue length of station re 


u K (t) 


controller of station re 


1 


indicator function 


Bi 


profit function of player i 


Ci 


cost function of player i 



A. Model 

A competitive MoD system consists of three intercon- 
nected networks: the user queueing network, the vehicle 
queueing network and the player network. Figure [T] shows 
the architecture of the system. 
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Fig. 1 . A competitive MoD system where the companies of shared electrical 
vehicles and bicycles serve the area 

1 ) The user queueing network: There is a set of stations, 
say S, in a spatial area of interest, and the interconnection of 
the stations is characterized by the graph Q§ = {§, £§} where 
(k, k') £ £§\diag(S) if and only if the users at station n can 
be delivered to station n'. The graph Q§ is fixed, undirected 
and connected. Denote by M K = {«' £ § | (k, k') £ £§} the 
set of neighboring stations of station k. 

Users arrive at station k £ § in a dynamic fashion. Let 
c K (t) £ R>o be the user arrival rate at station k at time t, and 
its temporal evolution is governed by the following ordinary 
differential equation: 



In ([TJ, Q K {t) is the queue length of station k and will be 
defined later. The function h K : R> — > R>o is locally 
Lipschitz in (c K (t),Q K (t)) and piecewise continuous in t. 
Let a KK i(i) £ [0,1] be the fraction of users who arrive at 
station k at time t and want to reach station k' ^ k. Thus 
Sk'gat a KK,'(t) — 1- We assume that the fraction a KK /(t) is 
fixed; i.e., a KK >{t) = a KK > for t > 0. 

A queue is associated with each station k £ §, and the 
arrived users wait for the delivery in the queue. Let Q K (t) £ 
]R>o be the queue length of station n £ S at time t, and it 
dynamics is given by: 

Qn{t) = (c K (t) - U K (t))l[Q K(t) > ], (2) 

where the initial state Q K (0) > 0, and the quantity u K (t) = 

T,tev ^« (*) e R >o where $ (*) is the delivery rate of 
player i at station n and V = {l,-'' > N} is the set of 
players explained later. 

Denote the vectors Q = (Q K ) KeS , c = (c K ) Ke §, £, = (Q, c) 
and u = (M K ) re g§. We then rewrite ([TJ and Q into the 
following compact form: 

Q(t) = h Q (c(t),u(t)), (3) 

c(t) = h c (c(t),Q(t),t). (4) 

and then, 

m = h s (at),u(t),t). (5) 

2) The vehicle queueing network: There is a group of 
players V = {1, • • • , N}. Each player is a vehicle-sharing 
company, and he provides the service of delivering the users 
on the graph Qg. Let v$ (t) £ R> be the number of vehicles 
of player i stored at station k at time t. If (t) > 0, then 
player i is able to deliver the users leaving station n at a rate 
0{t) £ [a, 0^ - a] with < a < I^Lf] otherwise, 
player i cannot deliver any user; i.e., /3«'(f) = 0. In order to 
avoid Vk (t) becoming zero, each player i needs to reallocate 
his empty vehicles. Let a^ K , (t) £ [a, ccSax — a] with < 
a < |o!roax be the rate that the empty vehicles of player i 
are transferred from station k to station k' at time t. The 
dynamics of v$ (t) is based on the mass-conservation law 
and given by: 

= {-/$(*)+ E 

- E *(*)+ E "Iwh^w^ 
+ { E «^(*)+ E 4t(*)}i^ (t)=0] , 

(6) 

with the initial state ^(O) > 0. It is easy to verify that 
vj? 0) > for all t > 0. 



c K (t) = h K (c K (t),Q K (t),t). (1) 



The quantity a > could be chosen arbitrarily small. 



3) The player network: Each player in V has three 
partially conflicting objectives: the first one is to collectively 
regulate the queue length Q K (t) to the desired level Q K £ 
R>o, the second one is to maintain v${t) to be strictly 
positive, and the third one is to maximize his own net profits. 
In the sequel, we will explain each objective in more detail. 

Firstly, players aim to collectively regulate the queue 
length Q K (t) to the desired level Q K . For the time being, 
we assume that there exists a smooth controller C/(£(t)) = 
(£^t which is able to achieve the queue regula- 
tion. Hence, players share a common goal of enforcing the 
controllers U K (t) = U K (£ K (t)) for all n E S. 

Secondly, each player i wants to maintain v$(t) > 
in order to sustain a non-trivial service rate f$ (t) > 0. 
Since Vk\o) > 0, player i can achieve this goal through 
simply keeping v$ (t) as a constant; i.e., enforcing the hard 
constraint v$ (t) = all the time. 

Thirdly, each player is self-interested and desires to max- 
imize his net profit at each time instant. In particular, each 
player i is able to make a profit from the delivery service, 
and the profit is modeled by Bi{j3\^{t)) where the function 
Bi : R>o — > H>o is smooth and strongly concave with 
constant pi > 0. On the one hand, the transfer of empty 
vehicles is costly, and the expense is modeled by Ci(a^ ,(t)) 
where Cj : !R>o — > H>o is smooth and strongly convex 
with constant p\ > 0. The net cost of player i at time t is 

abstracted by E(„,„' )6 * C i( a !l<(*)) - E^s 

The decision vector of player i at time t is given by z^ l '(t) 
which is the collection of a^(t) = (o:|^/(i) )(«,«') e£ s an d 
f3^{t) = (/3k (t)) Ke g. The above three interests of player i 
at time t are compactly expressed by the following convex 
program parameterized by the vector U(£(t)): 



following compact form: 



s.t. ]T/fW = £4(^(0), «es, 

E E 

+ E 41(*) = °= «es, 



(7) 



where the dimension m — |S|+2|f§| and the set Zi is defined 
as: Zi 4 {zM | e [a, A - a], k G S, aJJ,, e 
[a,aH ax — a], (k,k') € In |7]), the decisions of 

the players are coupled via the constraint EieV^W = 
U K (£,K,(t)) which represents the common goal of the queue 
regulation. Other components in |7]) are instead separable. 

By using v(x) = if and only if v(x) < and v(x) > 
0, we rewrite the parametric convex program |7]) into the 



min fi(z 
zM (t)eR". 



(*)), 



s.t. G(^(t),^(t),U(m))<0, 
ft [il (z [,1 (i)) <0, z [i] (t)G^, 



(8) 



where G : R N ^ -> R m (to = 2|S|) and ftM : 1R™ 1 -> 1R P 
(p = 2|§|) are affine functions. The components of G and 



are asymmetric; i.e., = — G^ + |§| and h 



»] 



for 1 < £ < |S|. The collection of ([8]l will be referred to as 
the CVX game parameterized by U(£(t)), and its solution, 
Nash equilibrium, is defined as follows: 

Definition 2.1: For the CVX game parameterized by 
U(^(t)), the state z{t) 6 Z = Y\ ieV Zi is a Nash equilibrium 
if and only if: 

(1) G(j3{t),U(£(t))) < and < 0; 

(2) for any zW G Z 2 with G(/?H , /3[- l l (i), C/(^(i))) < 
and ftW(^H) < 0, it holds that < 

The set of Nash equilibria is denoted by Xc (£(£)). Since /j 
is strongly convex and separable, the map of partial gradients 
is strongly monotone, and thus Xc(£(f)) is non-empty; e.g., 
in [9]. 

B. Our objective 

At each time instant t, the players aim to solve the CVX 
game parameterized by the control command L/(£(t)), and 
implement a Nash equilibrium in Xc This procedure is 

repeated at the next time instant by shifting the time horizon 
forward. We term the collection of these finite-horizon games 
over the infinite horizon as a real-time game. In this paper, 
we will design an algorithm to update z(t) such that real- 
time game theoretic coordination is achieved; that is, 

lim dist(z(t),X c (£(*))) = 0, 

t—t-\-oo 

lim Q K {t) = Q K , k e S, 

t— >-\-oo 

vM(t)=v$(0), k6§, ieV, t>0. (9) 

Remark 2.1: In contrast to [2], [4], our real-time game 
theoretic coordination formulation relaxes the computation of 
infinite-horizon Nash equilibrium. Instead, our formulation 
aims to real-time seek the collection of instantaneous Nash 
equilibrium. By Lemma [oTTj one can see that if z{t) asymp- 
totically achieves Xc(£(i)), then the infinite-horizon average 
performance of z(t) is identical to that of Xc (£(£)). More 
importantly, the formulation allows us to handle constrained 
discontinuous dynamic systems and relax the a priori infor- 
mation of the arrival rates over the infinite horizon. 

Our game formulation is partially motivated by receding- 
horizon control or model predictive control; e.g., in [14], 
whose control laws are based on solving a sequence of finite 
horizon optimal control problems. Our game formulation is 
also partially inspired by optimization and games in dynamic 
environments; e.g., in [5], [6], [11]. However, this set of 
papers only consider open-loop decision making. • 



= E 



In the 



C. Assumptions 

Let /5 max = y^^czy ^Laxi 
remainder of this paper, we suppose that the following set 
of assumptions hold. 

Assumption 2.1: It holds that /3m ax are identical for all 
i G V and c K (t) G [c m i n ,c max ] for all t > and k G S. In 
addition, 2iVa < c min < c max < /3 max - Na. 

Assumption 2.2: There is S c > such that |jc K (i)|j < S c 
for all k and t > 0. 

Assumption 2.3: For any /jM with G [a,/?m ax — a], 
there is such that a^J., G [a, am ax — a] and the following 
holds for Kg§: 



E 

re'eJV« 



E 

k'G7V« 



+ E 



Q 



'1 



0. 



Assumption 2.1 requires that the maximum delivery rate 
is larger than the maximum arrival rate. This assumption is 
necessary for the queue stabilization. Assumption 2.2 means 



that the variations of the arrival rates are bounded. The 
combination of Assumptions |2.1|and|2.2|implies that ther e is 



<5f > such that ||£(t)|| < 8% for all t > 0. Assumption 2.3 
implies that given any feasible delivery vector /JM, each 
player i is able to maintain the vehicle balance at different 
stations. Since Gs is undirected, this assumption requires that 
large enough in comparison with /3m ax - 

III. Preliminaries 

In the sequel, we will first introduce an approximation of 
the CVX game parameterized by = [/(£(£)), namely, 
the regularized game. In order to simplify the notations, 
we will drop the dependency of on time t. We will 
then characterize the distance between the CVX game and 
the regularized game. After this, we will perform sensitivity 
analysis on the regularized game. 

A. The existence of smooth controllers 

With Assumption |2.1| we will show that the regulation 
of user queues can be achieved via the following smooth 
controller: 



u K (t) = U K (t K (t)) = c K {t) - U K (Q K (t)) 

Cmin ( /^max C max 



C K {t) 7- + 



Na 



1 + 2ft—- e — -^e-W^*)- 



(10) 



Towards this end, it is easy to verify that U K (Q K ) = c K (t) 
and U K (£ K (t)) e [SfL, /3 max - Na] C [Na, A„ ax - Na] by 
utilizing the monotonicity of the functions in U K . Hence, the 
controller U K (£ K (t)) is realizable for the players. Further- 
more, the Lie derivative of the regulation error h(Q K (t) — 
Q K ) 2 along for Q K (t) > is given by: 

' d ;(Q,(t) ~ Qk) 2 = (Qk (t) - Q K )U K (Q K {t)) 



2 dt 



ft 



max '--max 



Na 



x (Q K (t)-g K )( e - (Q " (t) - <5 " ) -1). 



(ID 



So, (Q K (t) - Q K )U K (Q K (t)) < for Q K (t) - Q K £ and 
Quit) > 0. Hence, U K (£ K (t)) is able to regulate Q K (t) to 
Q K from any initial state Q K (0) > 0. This controller will be 
used in the remainder of the paper. In the sequel, we will 
find uniform upper bounds on |j dU J^ || and || d ^2 ||- 
Lemma 3.1: The following holds for all £ > 0: 



d£ 1 



< D 



(i) 
u ' 



d^ 1 



< D 



(2) 



(12) 



Proof: Notice that 

^^ifc(Cre) /5max C m ax H~ 

dQ K = (l + 2 ft — 7— ^ 
^ ( 2)(/3 max c max — 

(Anax C max ~t~ 



0)2 



TVa) 



(Q«-Q«) 



- TVa) 

-JVa „. 



2(/3» 



-jVa) 



-(0«-Q«))4 



X (l-(^ 

-(q«-q„) 



^ma: 
Cmin 



iVa) 



(Qk-Q k )\2 



n 



dc K 



x e 



1. 



The above relations in conjunction with e W« € 
[0, e^] establish the desired bounds. ■ 

B. 77;e regularized game 

1 ) Regularized Lagrangian functions: To relax the con- 
straints of G(P,C) < and hfi(z®) < 0, we define the 
following regularized Lagrangian function for player i: 

d (z, /i, AM ,C)=fi ) + (M, G(J3, 0) + (AH , fcW (z W )) 
fees 



• E 



-pMl| 2 -pAH|| 2 +r^^)+r^^A[ l1 ), (13) 

with e > 0, t > and fi G ]R m and A^ G K, p are dual 
multipliers. The function i/> is the logarithmic barrier function 
and defined as follows: 

V(s)=log(f), s>0, 
^(s) = — oo, s < 0. 

Note that ?/> is concave and monotonically increasing over 
Il>o. In Li, the hard constraints in > 0, > and 
G Zi are relaxed by those defined via the logarithmic 
function. In addition, the terms associated with e play a role 
of regularization as shown in Lemma [3T2] 

We then introduce a set of dual players {0} U V m = 
{1, • ■ ■ , N}, and /i is the decision vector of dual player 0, 
and A^l is the decision vector of dual player i. Each primal 
player i G V aims to minimize d over zW G !R ni . Each dual 
player i G V m desires to maximize £j over \^ G R p and 



dual player wants to maximize H(z, p, () = (/it, G(/3, ()) - 



Proof: Pick any pair of 77, 77 G Zx R™^ p . Since G and 



ImII 2 + T 12iLi ^(w)' This game is referred to as the ftM are affine, one can verify that 



regularized game (RG game, for short) parameterized by £ 
and the definition of its NEs is given as follows: 

Definition 3.1: The state (z, p, A) G R"+™+p is a Nash 
equilibrium of the RG game parameterized by £ if and only 
if the following hold for each primal player i E V: 

the following hold for each dual player i G V m : 

r i (« ) A,AW,0<A(5,A,AW > 0. vaW gr p , 
and the following hold for dual player 0: 

H%n,Q<U{z,M), V/ieE" 1 . 

The set of NEs of the RG game parameterized by ( is 
denoted as X RG (C). Since the logarithmic barrier function rp 
penalizes p ^ R™ an infinite cost, thus it must be /2(C) > 
for any NE 77(C) <= X RG (C). Analogously, AM(C) > 0, 
pf ] (C) G (0,fl!SLx) and 5^,(0 G (0,a£L) for any NE 
r?(C) G X RG (C). 

2) Convexity o/?ne RG game: Since is strongly convex 
in zM and ip is concave over R>o, then Ci is strongly convex 
in «W G Z 4 4 {*M G R n < I pf G [0,ptL], K G 
S, a£J., G [0, qL], («>«') S £§} with constant 
min{pi,p^}. By introducing the quadratic perturbation of 
f ||AW|| 2 , the function of Ci(z,p, ■,C) is strongly concave 
in AM with constant e over R> . This can be verified via 
the following computation: 

d 2 C t = 

d (xfy 



< -e. 



(14) 



Analogously, the function of "H(z,p) is strongly concave in 
p with constant e over R" l . 

3) Monotonicity of the RG game: It is noted that all the 
functions involved in Ci are smooth in Z x R™ x R> . We 

then define V aW AO*, M, C) : ^ x ^>o x R>o -> R "' 
as the partial gradient of the function £j(-, , /1, A^ , C) at 
zW. Other partial gradients can be defined in an analogous 
way. Let 7/ = (z, p, A), and define the map Vf2 : Z x 
R™ x R^, -> R"+ m +P as partial gradients of the player's 
objective functions: 

Vfifa.C) 

4 [V, W A(«, aw , c) T . . . V 2[N] £ JV (z, M , A™ , C) T 

- V AI1 ,A( 2 , /a, AW, C) T • • • - V x mC N (z, A™, C) T ] T 

The following lemma shows that the quadratic perturba- 
tions of ^l|/i|| 2 and §||AM|| 2 regularize the game map Vf2 



to be strongly monotone over Z x R>q 



Lemma 3.2: The regularized game map Vf2(f7,C) is 
strongly monotone over Z x R" 1 ^ with constant pa — 
min{mini e y{pi, p'^, e}. In addition, there is a unique NE 

77(C) gX rg (C). 



(Vfi(77,C)-Vft(77,C),r7-77) 

iev 

+ e\\p - /2|| 2 + e]T || AW - AW || 2 + ^ + B 2 + B 3 , 



(15) 



where the terms of B-y, B2 and B3 are given by: 

jgy Kgs Pk 

|TVV(^ pr - pj-J^W - pf), 

igy K gS Pmax Pk Pmax Pk 

* 2 = -E E (^r-^r)(e-e) 

i6V(K,K')G£s a KK' a KK' 

+ t E E ( w : w — H ~ _[»] )( Q Lt-» 

iSV (k,k')G£s a max « KK / "max « KK / 
m 

•B3 = r ^(log(^) - log(/^))(/^ - 



+ r^^(log(Af 1 )-log(Af ] ))(Af ] -A| 



By using the monotonicity of functions in B%, B 2 , B 3 , it is 
readily to verify that Bi,B2,B 3 > 0. Apply the mean-value 
theorem for vector functions and f is strongly convex with 
constant min{pi, to ( fj"5j ). We then reach the desired result 
that Vfi is strongly monotone over Z x R" l Q h:p with constant 
Pq. The strong monotonicity of Vf2 ensures the existence 
and uniqueness of NE in Xrg(C); i- e -' m PL ■ 



C. Approximation errors of the RG game 

As mentioned before, in the RG game, the hard constraints 
Pi > 0, \$ > and «W g 2j are relaxed by those 
defined via the logarithmic function. Hence, the RG game is 
completely unconstrained. This will allow us to characterize 
the sensitivity of the RG game on (\ However, the RG game 
is merely an approximation of the CVX game. We now move 
to characterize how good this approximation is. 

By the convexity or concavity of Ci on its components, 
the following first-order conditions hold for the unique NE 

77(C) G X RG (C): 

V, w A(5,/i,AW,C) = 0, V AM £<(«,A,AW,C) = 0, 
V M H(z,AC) = 0. (16) 



These relations are explicitly expressed as follows: 



that is, 



0, K G S, 



pre pmax Pre 

V^^+E^^W 
1=1 

T T 

ur + m m = °' ( K > K ') e f s, 



(&t-Gt(p,£)- — = 0, <=l,-,m, 

eAW-4 i] (#])-^ I = 0, *=1, 
A« 



(17) 

(18) 

(19) 
(20) 



Since /Lt^,A^ > 0, solving ( |T9] > and ( |20] i renders the 
following: 

jf = 3(G[ l] (/3,0) > 0, \f = g(hf(zM)) > 0. (21) 

The following proposition verifies that the RG game can 
be rendered arbitrarily close to the CVX game by choosing 
a pair of sufficiently small e and r. In particular, (PI) and 
(P2) show that the violation of equality constraints is at most 
max{ft(f, t), <^g(c 5 t )} = °( e , T )- (P3) implies that the cost 
at the NE fj(£) is o(r)-suboptimal. (P4) and (P5) provide a 
set of bounds on NEs. 

Proposition 3.1: The unique NE 77(C) <= ^rg(C) I s an 
approximation of Xq(C) in the following way: 

(PI) |/$(* [i] )|<?fc(e,r); 
(P2) |G K (/3)| <c G (e,r); 

(P3) The following holds for any zM 6 Zj with 

G^W,^!-*]^) < and ftW(z^) < 0: 

/ l (^ ] )</ 4 (^ 1 ) + r(p + m) 

+ 2r(|S|log(/3W ax ) + l^|log(aS ax )). 
(P4) It holds that for i G V: 

ff(-?Gf(c,r)) </i«(0 <3fe(e,r)), 
ff(-ft(e,T))<AW(C)<fl(ft(e,r)). 

(P5) It holds that 



max 



2r + ^ ax -^ 2r + ^, 



-'max 
' u max 



2t + S'/a 



< a w , < aw 

— kk — L max 



max 



max 



2r + <5''a 



max 



Proof: In order to simplify the notations, we drop the 
dependency on the parameter £ unless necessary. 
Claim 1: (PI) holds. 

Proof: Since 77 is an NE, then the following holds for 

any G l n ": 

A(z,^A,)~A^ [l U H1 ,/AA 4 ) <0, 



-r^(^(/3W)+^i x -^])) 

reGS 

-r E (V^Lt)+^(« 

(re,re')e£s 



-8 

max 



0) 



reSS 



W/f )+V<(/fL-/f )) 



(re,re')efs 

Choose = p^. By Assumption 



(22) 



2.3 



there is a with 

, € [a,aSax - a] such that h^(z^~0. Substitute z® 



into ( |22| i, and we have 

/..(^Wj + ^W.ftW^W)) 



- E 

</i(^ [<1 ) + <A [< U [i] (^ ] )> 

- E 

(k,k')g£ s 



- » 



0) 



■ a 11 . 



))■ 



(23) 



Notice that the last two terms on the right-hand side of ( |23| ) 
are non-positive. So it follows from ( |23] l that 

(A [i] ,/i H (z w )) < 5i. (24) 

By ( |20| >, it is easy to see that Ak/j$(SM) are lower 
bounded by — t. Substitute these relations into ( p4] >, and it 
gives that 

~X^h^(^)<S l + (p-l)r, (25) 

Consider the first case of hl\z^) > 0. Substitute Q 
into (|25|l, and it renders that 



2h®(z®) 2 < ^ 1 (5 [iI ) 2 + ^ I (5 [iI )V /l ^ ? J (^ [il ) 2 +4er 
< 2e(5i + {p - l)r). 

Hence, ^(jgM) <ft(e,r). 

Consider the second case of /iL i] (zH) < 0. By the 
asymmetry of the components in tv-*', there is «' 7^ k such 
that /i^(fW) = and h$(z®) > 0. Follow the 

above steps, and we have h$(z^) < ?/j(e, r). Hence, we 
have > — ^(e, r). The combination of the above 

two cases establishes (PI). ■ 
Claim 2: (P2) holds. 

Proof: Pick any 1 < k < |S| and let k' = K + f. Then 
G K 0) = -G K ,0). 

Consider the first case of G K (/3) > 0. Recall that 
YsievP^ - C« = G K 0). So there exists i G V such that 
^(C K + G K 0)) < 0. Choose /3H such that = % and 
ftj = f3 l j for k' ± k. Recall that % eja, Hence, 

there is such 



2.3 



/3« € [a,/9max — a]. By Assumption _ 
that G [a,aS ax - a] and < 0. 



By j2T) and G K {fi) = -G K ,{p), we have fj, K - fr K , 
\G K (f5). Hence, we have 

{il,G0)-G0)) 

= MG K $ K ) - G K 0)) + ix K .(G K .{fi K >) - G K ,0)) 



_ G K 0) . G K 0)^_G K 0) 2 



iV 



iV 



Ne 



(26) 



Recall that AL i] ft| ] (z H ) > -r. Substitute into p2) , 
an d it gives tha t G K (,3) 2 < iVe(tfj + pr). Hence, G K (j3) < 
^/Ne(S t +pT) when G K (/3) > 0. 

Analogous to Claim 1, we have G K (/3) > 
— y / Ne(Si + pr) when G K (j3) < 0. The combination 
of the above two cases establishes (P2). ■ 

Claim 3: (P3) holds. 

Proof: It is a result of the relation ( p2) . ■ 

Claim 4: (P4) holds. 

Proof: It is easy to verify that the first-order derivative 
g'(s) — 9 (s) == > o anc [ t jj e second-order derivative 

V s+\/s 2 +4ct 

g"(s) = 2s +4eT 3 > 0. Hence, the function g is strictly 

(s 2 +4er)2 

increasing and strictly convex. From d2T| and (PI), we 
establish the desired bounds on £l k and AL ■ ■ 
Claim 5: (P5) holds. 
Proof: It is noted that 

m p 

II V^i /i (5 H ) + E WV^, G<(& + E *f V ^ ^ (* [<1 ) I 



e=i 



p 



< sup Iiv^/^II + EINI + Ell^ll^^- 

at*] /_i 



(27) 



Assume /3k < 



< 



Then we have 
2r 



< <5- - + -n- < 0. 
an an on 



an am 

This contradicts ( fl7] >. So it must be ^ > 
remainder of (P5) can be shown in an analogous way. 



The 



It completes the proof for Proposition 3.1 
Remark 3.1: The bounds on /3, 



and a 1 1 are shifted by 
a, and the argument in t/j is scaled by a. In this way the last 
two terms on the right-hand side of ( p3| are non-positive for 
any 6 Zi. • 

D. Sensitivity analysis 

As mentioned before, the RG game is completely uncon- 
strained. This allows us to perform sensitivity analysis on the 
RG game, and characterize the variation of the NE fj(U(£)) 
induced by the variation of £. In this part, we will drop the 
dependency of the NE f}(£) on £ unless necessary. 



Toward this end, we denote a set of matrices as follows: 



Ri(v) 



R2 = 



V z[J v] z [i]^Ar 



V z[11 G m (/3,C) T 

V z[N1 G m (^C) T 
,V xM AW(5W) r ]) igv 



V, [W ]GiG8,C) T ■ 
fi 3 ^diag([V z[i] 4 i] (# ] ) T , 

7" 

E 4 (r?) = diag(e + Tf^i,- ,m, 

T 

i? 5 (?y) = diag(diag(e + . . J l=i,- ,p) ! £V- 
( A £ ) 

Recall that G and are affine. Then V z y\Ci = if 
i 7^ j. Since £j is separable in its components, thus Ri(rj, £) 
is diagonal, symmetric and positive definite. In addition, R 2 
and i?3 are constant due to G and /jW being affine. 

With the above notations at hand, we can derive the partial 
derivative of the left-hand side of ( [To} with respect to fj 
evaluated at (fj, f), and this derivative is given by: 





' RM 


Rl 


Rl 


Jm(v) = 


-R-2 


RM 







-R 3 





R 5 (v) 



Let Jm be the partial derivative of the left-hand side 
of ( [17] ) to ( pO) with respect to £. Since G is affine in (, 
then Jat is state-independent. We then denote 



J(v) - Jm(v) x Jn, 



(28) 



where Jm{v) 1 w iU be shown to be non-singular in the 
following lemma. 

Lemma 3.3: The matrix Jm(v(0) is non-singular, posi- 
tive definite and its spectrum is uniformly lower bounded by 
e m.in i& v{ Pi, Pi} > 0. In addition, J(fj(Q) is continuously 
differential in £, and the following relation holds: 

Proof: Recall the following identity for non-singular 



-4, 



" A 1 


A 2 ' 




' A 1 


" 






A± 






I 





/ A^A 2 
A 4 - A 3 A^ 1 A 2 



(30) 



By ( |30l l, the determinant of Jm{v) is computed as follows: 



det 



det {J M (rj)) 
where T(fj) is given by: 
T i(fj) = Ri(v) + T 2{fj) 

^RM + \Rl Rj] 



RM o 

o R 5 (fj) 



det(Ti(7j)), 



RM 








-1 


' R 2 ' 






R 3 



Recall that Ri(fj), i? 4 (77) and -R5 (77) are symmetric and 
diagonal. So Ti(fj) and T^t?) are symmetric. Since i?4 (77) 
and Rsiij) are positive definite and diagonal, so T2(fj) is 
positive semi-definite. Recall Ri(ij) is positive definite. By 
using A min (Ai + A 2 ) > A min (Ai) + A min (A 2 ), we know 
that 21(77) is positive definite. Hence, det(7\(?7)) ^ and 
det( Jm{v)) 0- This implies that Jm(Ji) is non-singular. 

By pO) again, the determinant of Jm(v) ls computed as 
follows: 



1 (Jm(v)) > Amin( 



R 5 (v) 



)A min (Ti(7})) 



{Ri{f))) > emhi{p h p' i }. 

V 



Recall that Jm(v) is non-singular. By the inverse function 
theorem, we reach that 77(C) an d J(fj(0) are continuously 
differentiable in ( and the derivative of 77 with respect to £ 
is given by: 



dm 



mo)- 



(31) 



With the relation (31) , we establish the derivative of 
fj(U(£(t))) with respect to t as follows: 

dWM))) «))c W = j(77(cW))^»e(t), 



dt 



d((t) 



dm 



where is well-defined since U is smooth. 



Remark 3.2: In the paper [6], a relation like ( |29| > between 
saddle-points and the parameter is derived from the Karush- 
Kuhn-Tucker condition. However, the results in [6] are not 
applicable to our problem. Firstly, the Lagrangian functions 
Ci and H are merely concave in A^ and fj, if e, t = 0. 
Secondly, the paper [6] assumes that the state-dependent 
matrix derived from the Karush-Kuhn-Tucker condition is 
uniformly non-singular. This is not easy to check a priori 
and may lead to instability in our feedback setup. • 

Lemma 3.4: The functions 7(77) dU d ^ and VO are Lip- 
schitz continuous with constant Lj > and Lq > 0, 
respectively, over Y. 

Proof: Note that Jm{v) 7m (f?) -1 = 7. Take the 
derivative on 77, and we have 



77 A / (77) 
di] 



7m (r?)" 1 + J m (t)) 



dJMjrjY 
drj 



This gives the following relations: 

lidjMW "^|<||7 M (r7)- 1 |||| C % ( ^||||7 M (77)- 1 | 



d?7 7?7 
< (emin{/j 4 ,p-})~ 2 | 



MN-2|| dJ A/(?7) 



drj ' 



By (12) , we derive the following relations: 



dJjrj) dU(Q n ^ n dJ M (tj) 
dii dt; 



< 



< (emm{pi, fy)- 



drj 
77 M (r?) 1 



drj 



\\Jn\ 



dim 

d^ ' 

dim 

' d^ 1 



\J(v) 



d 2 i 1 



< 



\J(v)\\\\ 

(2) 



d 2 U(Q { 
d 2 ^ 1 



(32) 



From (32) , we reach the desired Lipschitz constant Lj on 
J ( 7 ?)^f over Y - ■ 



IV. Real-time game theoretic coordination 

In this section, we will present an algorithm for the real- 
time game theoretic coordination. It will be followed by the 
convergence properties of the closed-loop system. 

A. Algorithm statement 

Denote by f)(t) the NE of the CVX game parameterized 
by CW — U(£(t)) where 77(f) consists of 

Ht) = (G9j?(*))« 6 sW, &{t) 4 ((aW,(t)) (<8iB , )6ft ) igv , 
£(t) = (ii«(t)) ree{ i,.. im}! A(t) = ((AW(i)) Ke{ i,.., p} ), e y. 

At each time i, each primal player i maintains the esti- 
mates [ K ] (t)) KeS and (a£J.'(*))(«,«')e«s of (£*(*)) «eS and 
(^kL (*))(«,/«') efs- E a °h dual player i maintains the estimates 

(A« (i))„ e {i,...,p} of ( A « (*))«e{i,-,p}- The °P erator then 
maintains the estimate p,(t) of /2(i). 

The decision makers update their own estimates 
by decreasing the distance to the instantaneous Nash 
equilibrium fj(t) and simultaneously following the 
temporal variation of fj(t). The update rules are 
given in Algorithm [TJ In particular, the quantity 
v(t) = J(r}(t), U(£(t))) 3j± ] £(t) serves as the estimate 



where in the last inequality we use Lemma 3.3 



G f d v(u(£(t))) j n L emma |3.3| and is decomposed into 

(4 1] w T >- ,vW(tr,v»(tr^ ] (tr,... (trr 

where v$ (t) (resp. 1$ (t)) is assigned to primal (resp. dual) 
player i and v^(t) is assigned to the operator. 

The quantities of $^(t) and aW(t) are intermediate 
estimates and probably fails to enforce the constraint 
v K (t) = 0. To address this, each primal player i obtains 
(aM(t),/3N(t)) = Q l (b^(t)) where pW(t) = j3®(t) and 
aW(t) is the orthogonal projection of a^(t) onto the set 
SW (t) defined by: 

sW(t) 4{ a w e rI^i 1 ^orW = 6W(t), 

where &L' ] (i) - 0{t) - £ K , eA/ - K a^, 1 W and = 
(&$(*)) «es- After that, all the players in V implements 
the control commands j3$(t) and o^} K ,{t) in the queue 
dynamics d3j. Here, each primal player i prioritizes the 
stabilization of the user queues, and enforces the constraint 
v K (t) = by reallocating his empty vehicles. 



Remark 4.1: The computation of the orthogonal projec- 
tion can be encoded by the following quadratic program 
which can be solved by a number of existing efficient 
algorithms: 

min ||a [i] -a [i] (t)|| 2 

Q [i] e R|£sl 



Algorithm 2 The closed-loop system 
1: The dynamics of the user queueing network: 

Q(t) = h Q {c(t),u(t)), 
c(t) = h c (c(t),Q(t),t), 

where the controller u K (t) = X^evA^W w i tn 



.t. AaW=bW(t), a^^ol-a], («,«') £ £$. (a™ (*), /? W (*)) = ®i(b® (t)) 

2: The dynamics of the vehicle queueing network 



If «max is sufficiently large, the computation of the orthog- 
onal projection can be greatly simplified. Let a^(t) to the 
projection of a^(t) onto the solution set of Aatfi = &W(t). 
Notice that A is orthogonal to all the vectors in the plane of 
Aa [t] = b^(t). Then we have 



A(A&® (t) - Aa [i] (t)) = d w (i) - a W (*). 



That is, 



a W(i) = aW(*) - A(Aa®{t) - 6 [il (i)). 



The real-time game theoretic regulator is formally stated 
in Algorithm [T] 

Algorithm 1 Real-time game theoretic coordinator 
Require: Each primal (resp. dual) player i chooses the 

initial state zM(0) E Z, t (resp. A^(0) G A,). The 

operator chooses the initial state /u(0) € M. 
Ensure: At each time instant t > 0, the decision makers 

execute the following steps: 
1: The operator generates the estimate v(t) = 

J(ij(t), t/(£(«)))^gf%), and update according 

to the following rule: 

fi(t) = P M Ht) + aD„(t) + w p (t)] - M (t), 

where a > and D^t) = V M %(z(£),/i(i), U(£(t))). 
The operator then informs player i the information of 

2: Each primal (resp. dual) player i updates zM(t) = 
(/P (t),a® (*)) (resp. A H (*)) according to the following 
rule: 

«W(t) = PzJzH(i) - ttUW(t) +v®(t)} - zW{t), 
A«(t) - PAjAM(t) + oDj? (t)+t#(t)] - 

where 4 V x[ *i /<(<), AW(t), I7(£(t))) and 

3: Each player i generates and implements the control 
commands (a® (t) , 0® (t)) = Qi(& [i] (t)). 



B. The closed-loop system and its performance analysis 

The closed-loop system consists of the user queueing 
network (pj, ffl, the vehicle queueing network |6]) and the 
real-time game theoretic coordinator (Algorithm [TJ. For the 
sake of completeness, we summarize the closed-loop system 
in Algorithm [2] 



v K {t) = 0. 



3: The real-time game theoretic coordinator: 

!)(*) = Pxfo(t) - aD(f) + J(77(t), 



r,(t). 



The following theorem summarizes the performance of the 
closed-loop system. 



Theorem 4.1: Suppose Assumptions 2.1 2.2 and 2.3 
hold. Suppose the following holds: 

tf4 (l-a Pn + a 2 Ll + (LjD^6^ 2 

+ (l + aLn)LjD^6 i ) h < 1. 

The estimates /P(t) and generated by Algorithm [T] 

approximates fj(t) = Xrq(CW) m tne wa Y tnat 

Um |mW(t) — yfirW(t)|| ->0, 

limsup||aW(t)-a I<] (t)|| < \\A\\sG{e,T), (33) 
t— y+oo 

Furthermore, the queue dynamics achieves the following: 

J, (34) 



limsup \\Q K (t) - Q K \\ < max{A min , A ma>; , 



where A E 
and A r 



4 ln(l 
-ln(l 



A, 



2<r G (e,-r) 



)) 



Remark 4.2: Recall that /)q = min{miriigy{pi, p^}, e} 
and (5^ represents an upper bound on the maximum variation 
of the user arrival rate. If 6^ is smaller, we can choose a 
set of smaller a, e and r to satisfy $ < 1 and reduce the 
right-hand sides of ( [33) and p4| . That is, if the maximum 
variation of the user arrival rates is smaller, the steady-state 
system performance can be improved. • 
Proof: We divide the proof into several claims. 

Claim 1: It holds that ||u(t)|| < S v for all t > 0. 
Proof: It is noted that 

||«(t)||<||J(»?(*),^(t)))|||| ^IplMH 



< 



\\Jn\\D^6 5 



e mm ieV {pi, p^} 
where we use Lemma 13.11 the relation d28l) and Lemma 13.31 



Claim 2: It holds that fi K (t) G [0,AJ and A$(t) G 
[0, A A ] for all t > 0. 

Proof: The dynamics associated with can be written 
as follows: 



/*«(*) = -/•*«(*) + d K (t), 



(35) 



where d K (t) € [0, A M ] for all t > 0. It is readily to see that 
[0, A/j] is an invariant set of ( f35] > and thus ii K {t) G [0, A M ] for 
all t > 0. Analogously, one can verify that (t) € [0, A a ] 
for all t > 0. 



Claim 3: It holds that s\ 



< < ^max and 



;['] 



/3,min 



4 l !mi„ < «i!U*) ^ for alH G F and f ^ °- 

Proof: The dynamics associated with /% J can be written 
as follows: 



/f (*) 



/f(t) Ana* 



(36) 



where 



<f (t) = -V^/i^^ + ^wMV^^^t),^))) 

£=1 

+ EAf ] (t)V^ ] ^(^W(*))+^ 1 W- 
N ote t hat d$(i) < (2,9. Analogous to (P5) of Proposi- 



tion [3_1| one can show that df nlin < f3^ ] (t) < <5g max . 
Analogously, it holds that <$j*] min < a l ^(t) < ■ 
Claim 4: It holds that fj, K (t) G [^ Iiini „, ^,max] and 

,min j ^A,max] • 

Proof: The dynamics associated with can be written 
as follows: 

£«(*) = Pm[M*) - ( e M K W - G K (fi(t),C*(t)) - -4tv) 



•««(*)]-/*«(*)• 



(37) 



Let tt(t) = Vn(ri{t),((t)), and = ^j^ . From the 
definition of if one can see that Pjf never applies and thus, 

P K R(t) - au(t) + J(r?(*))x(i)£(*)] 
= - au(t) + J(fj(t)) X (t)i(t). 

This implies the following temporal evolution of fj(t): 

^ = JW))x(t)i(t) 

= P K [fj(t) - au(t) + J(fj(t)) X (t)i(t)] ~ fj(t). (38) 

The combination of Lemma 13.41 and Claims 5 and 6 
implies that 

\\J(v(t))~ J(v(t))\\ <Lj\\v(t)-m\\, (39) 

\\D(t) - u(t)\\ <Ln\\v(t)-v(t)\\- (40) 

We recall the regulator for players as follows: 

m = p K [v(t) - aD(t) + j( v (t), at))x(t)m - »?(*). 

(41) 



Choose the Lyapunov function candidate W(i](t), fj(t)) = 
— fj(t)\\ 2 for system ( |4T| i. The following claim pro- 
vides an estimate of W. 

Claim 5: The following estimate holds: 

W <2(-l + + a)i)W + T(t), (42) 

Proq/: It follows from ([38]) and (|4T]) that 



w = (rj(t)-m,m 



dt 



(43) 



= -h(*)-*K*)ll 2 + *(*), 

where the term ^(t) is given by: 

P K [ V (t) - aD(t) + Jfa(t),C(*)M*)£(*)] 
-P K [rj(t)-au(t) + J(fj(t))x(m(t)}). 

By the non-expansiveness property of Pk, we have 

||Px fo(t) - aD(t) + J( V (t), C(t)) X (t)i(t)} 
-P K [fj(t)-au(t) + jm))xm(t)]f 

< \\(v(t)-m)-^(D(t)- U (t)) 

+ (j(rmx(t))x(m)-j(m)x(m))\\ 2 

< Ht) - m\\ 2 - ^{t) - m,D(t) - u (t)> 

+ a 2 \\D{t)-u{t)\\ 2 

+ \\j(r 1 (t)x(t))x(m)~j(m)x(m)\\ 2 
+ \\v(t)-m\\ 

x i|j(»?(t),c(t))x(*)e(*)-^(*))x(*)e(*)ii 

+ a||D(t)-u(t)|| 

x ii j(»?(t), c(t))x(*)ew - (44) 

By the strong monotonicity, we have the following for the 
second term on the right-hand side of ( |44| ): 

( V (t) - v(t),-(D(t) - u(t))) < -pn\\v(t) - V(t)\\ 2 - (45) 

We have the following for the last three terms on the right- 
hand side of (|44j): 

\\j(m,at)Mm)-j(mMm)\\ 

<Lj\\ X (t)i(t)\\Ht)~m\- (46) 



Substitute (|45j, (|40]) and (|46) into ( |44) , After grouping, 
we have the following: 

||P*fo(t) - aD(t) + J(rj(t),at))x(t)i(t)} 

- F K [ij{t) - au(t) + Jm))x(tW)}f 

< (l-apn+a 2 L 2 n +L 2 j(D^5 £ ) 2 

+ (1 + aL^LjD^S^Mt) - f,(t)\\ 2 . (47) 

This estimate further implies the following estimate of ^(t): 
Wt)\\ <(l-a P n+a 2 Ll +L 2 {D^5tf 
+ (1 + aLn)Lj(D^6 ( )^Ht) - f,{t)\\ 2 - (48) 



Claim 4 immediately implies that the following conver- 
gence property: 



lim \\r,(t)-fj(t)\\=0. 

t— >-\-oo 



(49) 



From Proposition |3.1| it follows that fj(t) S Xrg(C(*)) i s 
an approximation of Xc(£(t)). Since the function [G K (-)] + 
is continuous, thus it follows from Proposition |3.1| and ( |49| ) 
that 



V. Conclusions 



In this paper, we have introduced a model of competitive 
MoD systems and proposed a real-time game theoretic coor- 
dination problem for the system. We have came up with an 
algorithm to achieve vehicle balance and practical regulation 
of the user queueing network. 



limsxip\\G K (x(t),C{t))\\ <<fe(e,r). (50) 

t— y+oo 

Hence, the controller for the queue network can be written 
in the following way: 



u K (t) = U K (£(t)) + A K {t), 
where the perturbation term A(i) satisfies 
limsup||A K (i)|| < ftj(e,T). 

t— )-+oo 



(51) 



VI. Appendix 



A. Notations 

In this section, we summarize the notations used in Sec- 
tions |In] and UV] 



Claim 7: The relation (|34j holds. 

Proof: Consider the Lyapunov function candidate 
V K (Q K ) — \{Qk, — Qk) 2 - Its Lie derivative along |2]i is given 
by: 



1) Notations for Section III Denote Z$ = {z^ g 

K"* | € [a,/?Ex -a], k e S, a l * ] K , g [a,aS ax - 
a], (k, k') G £§}, Z 4 n, ey Z 4 , Z 4 4 {zW g R»* | g 
[0,/SSLx], K€S, «Li e [0,ali x ], (k, k') g £g} and 
z = Tliev Z f 



dQ K 



Ok = (Q K (t)-Q K )Q K 

= (Q K (t)-Q K )(U K m)) + & K (t)) 
= (Q K (t)-Q K )(x(e- (Q ^ t) -^ ) -1) 



£>^ } = ISIfl 



-'max L max 



+ c^_ Na 



A„(t)). 
(52) 



(1 + 2 



2(y? 



max ^max 



J^fO masses q„ 



When Q K (t) - Q K > A max , we have 



(2) A (Anax — C max + ™° — Act) 



2(/3„ 



-JVa) 



^n^^ma^^ ^( e -W~(*)-^)-l)+A(t) 

^- c -^- Q e -(0.(t)-OJ v 7 w 



< (# 



max L max 



x(l + ( 



209 



max u max 



Crrmv A^ £j) 



C m ax — -/Va ^4 



Hence, the following holds for Q K (t) — Q R > — ln(l 

2w(e,r) \. 
/3 max -c m ax-a' 1 - 



1] 



Analogously, when Q K (t) — Q K < — A m ; n , we have 

/^max *-max ^ 



(53) 



1 



> 



max b max 



» e -(Q«(t)-<5«) 

Cmax f* 



( e -W»(*)-o»)_i) + A(t (t) 



<?(s) 4 ( s + v / s 2 + 4er) ) 

5,4 sup inf / t (z' ! ') 

+ 2r(|S|^(^ max ) + |£s|^(om-x)). 



1 + 



(e-W-W-«-)-l)- ?G (e,r) 



T ) = max V e (^ + (P - !) r )> 



Hence, the following holds for Q K (t) — Q K < — A n 
^^Qk < -?G(e,T)A min . 



(54) 



The combination of ( |53| l, ( |54| i establishes the desired result 
of d34l. ■ 



?Gf(e, t) — max y/Ne(8i +~pr), 
<^4 SU p ||V 

+ 2|S| max{ S (-c G (e, r)), 5 fe(e, r))} 
+ 2|S| max{.g(-ft(e, r)),g(%(e, r))}, 
£'4 sup ||V r fl /i^WjH 



This completes the proof for Theorem 4. 1 



sup 
•2|S|m. 



for any T>K 



V 

A, 4 fl (? G (e,r)) + (emin{ Pi , ^})- 1 || JjvpW^, 
A A 4 .gMe,T)) + (emin{ Pi , 11^ II 4% 
dp 4 max sup HV^ra/i^W)!! + 2|§|(A (tl + A A ), 



\\f(T)-g(T)\\<^- m) ~ 9mdt 1 ■^ ll/( * )_5( * )l|d * 



r 



< if !!/(*)_- g(*)ll<ft + £ (t-a:) _ (55) 



T 



T 



rfa = max sup ||V ,,, /.(z^H + 2|S|A A , 

4ey zWgZ, 



Recall that /, g are uniformly bounded. Take the limit on 
T in d55]l, and it renders that 



= max sup 

KGS z6%€[ £l f 1 ,fln,.x-A'a] 

c?a — max max sup h$(z) + S v . 



limsup||/(T)-5(T)|| < 

T->+oo 



(56) 



Since 
result. 



rW A 
/3,min 



tB [ ' 



r/3 W 

' pmax 



2r + dpPSLc 
[i] 

TCtmax 



2r + d/jflffiLx 
TQ!max 



a,min r^l " 

2t -\- d, a ft max 



u fj,,min 
^A.min 



5 — A 

^A.max — ^A? 



2r + d a [l] ' [3] 

[4] 
[5] 



y = {ry G R" | /Sj? G [5 



■['] 



/3,min ' /3,maxJ ' 



kk' I Q,min' a. max J ' r 



,min i ^/i,max ], A^ 1 G [^minA.max]}- 



drj 

x ( emi T n {ft,^}r 2 | 



sup | 



4r 2 ( 



-) 



JnWD 



i>} ^^Hp*,p-}\\JN\\D { J\ 



e mini 



sup 

i|6F 



drj 



2) Notations for Section TV We associate the incidence 
matrix A G Rl s l x2 l s l for the graph Q s . In particular, the 
K-th row is assigned to state k, and is in the form of 
[okI • • • a K |S| ! q ik ' ' ' a|s| K ]. If «/ G W«, then a KK * = -1; 
if k' G 7V k , then a K > K = 1; a KK ' = 0, otherwise. 

A,4{aW er | AW g [0,A a ], VkgS}, 
M = {/i 6 R' Tl I ^ K G [0, A M ], Vk g S}. 

B. An instrumental result 

The following lemma shows that the infinite-horizon 
averages of two functions are identical if two functions 
asymptotically approach to each other. 

Lemma 6.1: Consider the functions /, g : R>o — s- R 
which are uniformly bounded. If lim 4 ^ +00 \\f(t)— g(t)\\ = 0, 
then it holds that lim T ^ +oc ||/(T) - g(T)\\ = 0, where 
f(T) = and g(T) = H<p^. 

Proof: Pick any e > 0, there is K > such that 
11/(0 - g(t)\\ < e for all t > K. Then the following holds 



holds for any e > 0, we then reach the desired 
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